Pritchard, Stephens, and Donnelly on Population Structure.

نویسنده

  • John Novembre
چکیده

In essentially any species, genetic similarity among individuals is structured by the existence of subgroups and geographic isolation. For researchers, understanding this population structure can be of direct interest, a necessary waystation to further analyses, or a confounding nuisance. Regardlessof themotivation,understandingpopulation structure is an essential step for population genetic analysis. In 2000, Pritchard, Stephens, and Donnelly published one of the most widespread and important frameworks for addressing this task: the model-based clustering method known as STRUCTURE (Pritchard et al. 2000). The birth of the method owes much to having the right expertise in a room. In September of 1998, Pritchard arrived for a postdoc in Oxford just as a workshop at the Newton Institute in Cambridge was starting. At the time, he was finishing up work on a test for cryptic population structure in disease association studies (Pritchard andRosenberg1999) and had become interested in clustering related individuals. At the workshop, he shared these interests with his new advisor Peter Donnelly and fellow postdoc, Matthew Stephens. Donnelly was deeply experienced with Bayesian models in genetics, including models for forensic samples of uncertain origins (e.g., Balding and Donnelly 1995), and Matthew had written his PhD dissertation on Bayesian clustering using Markov chain Monte Carlo techniques (Stephens 2000). Bringing these backgrounds together, they hashed out the first model on a board within a couple of hours. Then, Pritchard implemented it with the help of Stephens over the next several days. The prototype worked well, and there were no real changes from what was written out in the first meeting to what was ultimately published (J. K. Pritchard and P. Donnelly, personal communication). The noveltywas in taking a Bayesian approach that assigns individuals to source populations or allows them to have proportional assignment of their ancestry to multiple populations (the “admixture model”). Their work followed that of those who had developed likelihood-based individual assignment to populations (Paetkau et al. 1995; Rannala and Mountain 1997), population mixture models (Smouse et al. 1990), and Bayesian models of cryptic population structure (Foreman et al. 1997; Roeder et al. 1998). The resulting STRUCTURE method had a tremendous impact in human genetics, evolutionary genetics, and molecular ecology, and went on to be highly cited and awarded (Noor 2013). The admixture model is also used widely in machine learning where it is known as latent Dirichlet allocation (Blei et al. 2003). In addition, in the late 1990s, coalescent-based approaches dominated population genetic methods, and in this milieu the impact of STRUCTURE reinforced that relatively simple models can have tremendous utility. After the initial publication, Pritchard and colleagues extended the work to include addressing ancestry along chromosomes (Falush et al. 2003), dominant markers and null alleles (Falush et al. 2007), and prior group information (Hubisz et al. 2009). Others developed extensions that, for instance, carry out assignment to hybrid categories (Anderson and Thompson 2002) or assume a spatial distribution for populations (Guillot et al. 2005; François et al. Copyright © 2016 by the Genetics Society of America doi: 10.1534/genetics.116.195164 Address for correspondence: Department of Ecology and Evolutionary Biology, University of Chicago, Chicago, IL 60637. E-mail: [email protected]

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SNP-based analysis of genetic substructure in the German population.

OBJECTIVE To evaluate the relevance and necessity to account for the effects of population substructure on association studies under a case-control design in central Europe, we analysed three samples drawn from different geographic areas of Germany. Two of the three samples, POPGEN (n = 720) and SHIP (n = 709), are from north and north-east Germany, respectively, and one sample, KORA (n = 730),...

متن کامل

Probabilistic models of genetic variation in structured populations applied to global human studies

MOTIVATION Modern population genetics studies typically involve genome-wide genotyping of individuals from a diverse network of ancestries. An important problem is how to formulate and estimate probabilistic models of observed genotypes that account for complex population structure. The most prominent work on this problem has focused on estimating a model of admixture proportions of ancestral p...

متن کامل

Inference of population structure using multilocus genotype data.

We describe a model-based clustering method for using multilocus genotype data to infer population structure and assign individuals to populations. We assume a model in which there are K populations (where K may be unknown), each of which is characterized by a set of allele frequencies at each locus. Individuals in the sample are assigned (probabilistically) to populations, or jointly to two or...

متن کامل

The coalescent and its descendants

The coalescent revolutionised theoretical population genetics, simplifying, or making possible for the first time, many analyses, proofs, and derivations, and offering crucial insights about the way in which the structure of data in samples from populations depends on the demographic history of the population. However statistical inference under the coalescent model is extremely challenging, ef...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Genetics

دوره 204 2  شماره 

صفحات  -

تاریخ انتشار 2016